dplyr: Manipulating Datatidyr: Reshaping and Transforming Datapurrr & FunctionsAlways load all packages at the beginning of a script
tidyverse loads: dplyr, forcats (factors), ggplot2, lubrdiate, purrr, readr, stringr, tibble, and tidyr
conflicts() function to figure out what conflicts you havepackage::fnName() to call a function directly without loading a package / to override conflicts
kableExtra should be loaded before tidyverse, but then tidyverse masks kableExtra::group_rows()
The best way to do something is a way that you understand or you can introduce the mistakes you’re trying to prevent
bfi %>%
mutate(
sid = 1:n(),
E = rowMeans(pick(matches("E\\d")), na.rm = T),
A = rowMeans(pick(matches("A\\d")), na.rm = T),
C = rowMeans(pick(matches("C\\d")), na.rm = T),
N = rowMeans(pick(matches("N\\d")), na.rm = T),
O = rowMeans(pick(matches("O\\d")), na.rm = T)
) %>%
ungroup() %>%
select(sid, E:O)bfi %>%
mutate(sid = 1:n()) %>%
pivot_longer(
cols = c(-sid, -gender, -education, -age)
, names_to = c("trait", "item")
, names_sep = -1
, values_to = "value"
) %>%
group_by(sid, trait) %>%
summarize(value = mean(value, na.rm = T)) %>%
pivot_wider(names_from = "trait", values_from = "value") %>%
ungroup()HLM / MLM / MEM:
RE ex: time, trial, stimuli, group, study, day, w/in person conditions FE ex: gender, baseline age, b/w subject conditions, country, etc. (can also be an RE)
| ID | RE1 | RE2 | DV | FE1 |
|---|---|---|---|---|
| 1 | 1 | 1 | 4 | 3 |
| 1 | 1 | 2 | 3 | 3 |
| 1 | 2 | 1 | 2 | 3 |
| 1 | 2 | 2 | 1 | 3 |
| 2 | 1 | 1 | 5 | 1 |
| 2 | 1 | 2 | 3 | 1 |
| 2 | 2 | 1 | 1 | 1 |
| 2 | 2 | 2 | 2 | 1 |
| ID | RE_1_1 | RE_1_2 | RE_2_1 | RE_2_2 | FE2 |
|---|---|---|---|---|---|
| 1 | 4 | 3 | 2 | 1 | 3 |
| 2 | 5 | 3 | 1 | 2 | 1 |
Note this is only possible / easy because of the naming scheme! If we had them named “RE1_1”, this would not have been possible / would have been CONSIDERABLY more difficult
bfi %>%
mutate(sid = 1:n()) %>%
pivot_longer(
cols = c(-sid, -gender, -education, -age)
, names_to = c("trait", "item")
, names_sep = -1
, values_to = "value"
) %>%
group_by(sid, trait) %>%
summarize(value = mean(value, na.rm = T)) %>%
pivot_wider(names_from = "trait", values_from = "value") %>%
ungroup()# A tibble: 2,800 × 6
sid A C E N O
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 3.4 3.2 3.4 2.8 3.8
2 2 3.6 4 3 3.8 3.2
3 3 4.4 4 3.8 3.6 3.6
4 4 4.8 4.2 4 2.8 3.6
5 5 3.4 3.6 3.6 3.2 3.2
6 6 5.6 4.4 4 3 3.8
7 7 4 3.6 4.2 1.4 3.8
8 8 2.8 3 3.2 4.2 3.4
9 9 3.8 4.8 3.75 3.6 5
10 10 4.8 4 3.6 4.2 3.6
# ℹ 2,790 more rows
nested_RQ1, RQ1_mods, raw_df
tmp and overwrite it as many times as is usefulrm(tmp)
| cat | name | scale | long_name | lab |
|---|---|---|---|---|
| Outcome | dementia | 0/1 | Clinical Dementia | OR [CI] |
| Outcome | braak | 1-5 | Braak Stage | est. [CI] |
| Predictor | E | 0-10 | Extraversion | |
| Predictor | C | 0-10 | Conscientiousness | |
| Moderator | age | num | Baseline Age | |
| Moderator | ses | 1-7 | Baseline SES |
forcats.out <- tribble(
~cat, ~name, ~scale, ~long_name, ~lab,
"Outcome", "dementia", "0/1", "Clinical Dementia", "OR [CI]",
"Outcome", "braak", "1-5", "Braak Stage", "est. [CI]"
)
tab %>%
left_join(out %>% select(outcome = name, long_out = long_name, lab)) %>%
mutate(long_out = factor(long_out, levels = out$long_name))
# mutate(long_out = factor(oucome, levels = out$name, labels = out$long_name))Remember this?
terms <- tribble(
~path, ~new, ~level
"i~1", "Intercept", "Fixed",
"s~1", "Slope", "Fixed",
"i~~i", "Intercept Variance", "Random",
"s~~s", "Slope Variance", "Random",
"i~~s", "Intercept-Slope Covariance", "Random"
)
extract_fun <- function(m, trait){
p <- parameterEstimates(m) %>%
data.frame()
# saveRDS(p, file = sprintf("results/summary/%s.RDS", trait))
p %>%
unite(path, lhs, op, rhs, sep = "") %>%
filter(path %in% terms$path) %>%
left_join(terms) %>%
select(term = new, est, ci.lower, ci.upper, pvalue) %>%
mutate(term = factor(term, levels = terms$new)) %>%
arrange(term)
}tribble()
googlesheets4 package is also a package dedicated to helping you to read, write, and parse Google Sheetsdementia-E-age-unadj.RDS)broom::tidy(), coef(), etc.), predicted values, random effects, etc. using the same file structure, and you always have everything at your fingertipslavaan. With slight modifications, it could also work for broom::tidy() outputformat_fun <- function(d){
d %>%
mutate(sig = ifelse(pvalue < .05, "sig", "ns")) %>%
rowwise() %>%
mutate_at(vars(est, ci.lower, ci.upper), round_fun) %>%
mutate_at(vars(pvalue), pround_fun) %>%
ungroup() %>%
mutate(CI = sprintf("[%s,%s]", ci.lower, ci.upper)) %>%
mutate_at(vars(est, CI, pvalue), ~ifelse(sig == "sig" & !is.na(sig), sprintf("<strong>%s</strong>", .), .))
}.R script that you can “source” (source("custom_functions.R"))
z_scale(), pomp_score()) and use case specific ones (e.g., lavaan_format_fun() or broom_format_fun()).R script can be included in the repo)R
tibbles help with this but don’t always play nice
Posit Cheatsheets
PSC 290 - Data Management and Cleaning